Importing Important Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#Data import and understanding
customer_behaviour_df = pd.read_excel("E Commerce Dataset.xlsx",sheet_name='E Comm')
customer_behaviour_df.head()
| CustomerID | Churn | Tenure | PreferredLoginDevice | CityTier | WarehouseToHome | PreferredPaymentMode | Gender | HourSpendOnApp | NumberOfDeviceRegistered | PreferedOrderCat | SatisfactionScore | MaritalStatus | NumberOfAddress | Complain | OrderAmountHikeFromlastYear | CouponUsed | OrderCount | DaySinceLastOrder | CashbackAmount | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 50001 | 1 | 4.0 | Mobile Phone | 3 | 6.0 | Debit Card | Female | 3.0 | 3 | Laptop & Accessory | 2 | Single | 9 | 1 | 11.0 | 1.0 | 1.0 | 5.0 | 159.93 |
| 1 | 50002 | 1 | NaN | Phone | 1 | 8.0 | UPI | Male | 3.0 | 4 | Mobile | 3 | Single | 7 | 1 | 15.0 | 0.0 | 1.0 | 0.0 | 120.90 |
| 2 | 50003 | 1 | NaN | Phone | 1 | 30.0 | Debit Card | Male | 2.0 | 4 | Mobile | 3 | Single | 6 | 1 | 14.0 | 0.0 | 1.0 | 3.0 | 120.28 |
| 3 | 50004 | 1 | 0.0 | Phone | 3 | 15.0 | Debit Card | Male | 2.0 | 4 | Laptop & Accessory | 5 | Single | 8 | 0 | 23.0 | 0.0 | 1.0 | 3.0 | 134.07 |
| 4 | 50005 | 1 | 0.0 | Phone | 1 | 12.0 | CC | Male | NaN | 3 | Mobile | 5 | Single | 3 | 0 | 11.0 | 1.0 | 1.0 | 3.0 | 129.60 |
#Dropping the columns which are not used in the analysis
customer_behaviour_df.drop(columns = ["CustomerID","CityTier","WarehouseToHome","NumberOfAddress","OrderAmountHikeFromlastYear"],inplace= True)
customer_behaviour_df.head()
| Churn | Tenure | PreferredLoginDevice | PreferredPaymentMode | Gender | HourSpendOnApp | NumberOfDeviceRegistered | PreferedOrderCat | SatisfactionScore | MaritalStatus | Complain | CouponUsed | OrderCount | DaySinceLastOrder | CashbackAmount | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 4.0 | Mobile Phone | Debit Card | Female | 3.0 | 3 | Laptop & Accessory | 2 | Single | 1 | 1.0 | 1.0 | 5.0 | 159.93 |
| 1 | 1 | NaN | Phone | UPI | Male | 3.0 | 4 | Mobile | 3 | Single | 1 | 0.0 | 1.0 | 0.0 | 120.90 |
| 2 | 1 | NaN | Phone | Debit Card | Male | 2.0 | 4 | Mobile | 3 | Single | 1 | 0.0 | 1.0 | 3.0 | 120.28 |
| 3 | 1 | 0.0 | Phone | Debit Card | Male | 2.0 | 4 | Laptop & Accessory | 5 | Single | 0 | 0.0 | 1.0 | 3.0 | 134.07 |
| 4 | 1 | 0.0 | Phone | CC | Male | NaN | 3 | Mobile | 5 | Single | 0 | 1.0 | 1.0 | 3.0 | 129.60 |
#replacing duplicates in Preferred Order category
customer_behaviour_df["PreferedOrderCat"] = customer_behaviour_df["PreferedOrderCat"].replace("Mobile","Mobile Phone")
#understanding the distribution of Tenure,HoursSpendOnApp,CouponUsed ,OrderCount,DaySinceLastOrder to find what method to be used in fillna
for col in customer_behaviour_df[customer_behaviour_numerical]:
if customer_behaviour_df[col].isnull().mean() > 0:
plt.hist(customer_behaviour_df[col], bins='auto', density=True)
plt.title(col)
plt.show()
Customer Demographic and Preferences
● To what extent does the time spent on the app relate to how frequently customers make purchases (Order count)?
● Is there a notable difference between app usage and purchase frequency, and does this differ significantly between male and female customers?
● How does the choice of preferred login device relate to the amount of time spent on the app? Is there a correlation between login device preference and app usage?
Customer Behavior and Engagement
● Are there any patterns forming between Preferred Order category and gender?
● Does the preferred order category have any correlation with marital status?
● Does the number of orders change based on the coupons and cashback that the customer receives?
Customer Churn and Retention
● Satisfaction score
● Complaint and Churn rate
● What is the correlation of satisfaction score and complaints with the churn rate?
Customer Satisfaction and Feedback
● Does tenure have an impact on satisfaction scores and the number of complaints raised?
Conclusion